Learning Rules from Distributed Data

نویسندگان

  • Lawrence O. Hall
  • Nitesh V. Chawla
  • Kevin W. Bowyer
  • W. Philip Kegelmeyer
چکیده

In this paper a concern about the accuracy (as a function of parallelism) of a certain class of distributed learning algorithms is raised, and one proposed improvement is illustrated. We focus on learning a single model from a set of disjoint data sets, which are distributed across a set of computers. The model is a set of rules. The distributed data sets may be disjoint for any of several reasons. In our approach, the first step is to construct a rule set (model) for each of the original disjoint data sets. Then rule sets are merged until an eventual final rule set is obtained which models the aggregate data. We show that this approach compares to directly creating a rule set from the aggregate data and promises faster learning. Accuracy can drop off as the degree of parallelism increases. However, an approach has been developed to extend the degree of parallelism achieved before this problem takes over.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

INTEGRATED ADAPTIVE FUZZY CLUSTERING (IAFC) NEURAL NETWORKS USING FUZZY LEARNING RULES

The proposed IAFC neural networks have both stability and plasticity because theyuse a control structure similar to that of the ART-1(Adaptive Resonance Theory) neural network.The unsupervised IAFC neural network is the unsupervised neural network which uses the fuzzyleaky learning rule. This fuzzy leaky learning rule controls the updating amounts by fuzzymembership values. The supervised IAFC ...

متن کامل

On the Complexity of Rule Discovery from Distributed Data

This paper analyses the complexity of rule selection for supervised learning in distributed scenarios. The selection of rules is usually guided by a utility measure such as predictive accuracy or weighted relative accuracy. Other examples are support and confidence, known from association rule mining. A common strategy to tackle rule selection from distributed data is to evaluate rules locally ...

متن کامل

Distributed Incremental Data Mining from Very Large Databases: A Rough Multiset Approach

This paper presents a mechanism for developing distributed learners for learning production rules from massive, dynamic, and distributed databases. The task of distributed learning is formulated by the concept of multiset decision tables that is based on rough multisets and information multisystems, which are derived from the theory of rough sets. We use the concept of partition of boundary set...

متن کامل

An Efficient Distributed Algorithm for Computing Association Rules

Data mining aims to eeciently discover previously unknown knowledge from large databases. It is highly demanding in numerous real-life applications, such as marketing strategy, nancial forecast, etc. One of the fundamental problems in the area is the eecient computation of association rules. In this paper, we shall investigate this problem in a distributed database. Particularly, we will presen...

متن کامل

Research on Improved Distributed Association Rules Mining Algorithm in Hadoop Cloud Platform

In this paper, data mining of association rules, data mining of association rules on distributed databases and distributed encrpytion techniques are introduced. Secondly, existing algorithms of data mining of association rules on distributed databases are analyzed in detail, and then they are improved on aspects of efficiency and security, whereafter the algorithm of EP_ DMA is proposed, later ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999